{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train a Model and Predict Device Activities\n",
    "\n",
    "The file works through training a model to detect activities of a given device. An activity is defined as any action a device allow its users to do, and each activity should contain at least three repeated experiments to make representative learnings. \n",
    "\n",
    "**Before you run this notebook, download the required pcap files.** Request the dataset at https://moniotrlab.ccis.neu.edu/imc19/. When access has been granted, download the `iot-model.tgz` archive, decompress it to the current folder. You should expect the file structure to be `traffic/us/yi-camera/{activity_name}/{datetime}.{length}.pcap`.\n",
    "\n",
    "**IMPORTANT** Make sure to use `python3`, and install all the dependencies. \n",
    "- `pip install -r requirements.txt`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Extract pcap files to per-flow level info\n",
    "Output has been truncated because of length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running ./raw2intermediate.sh...\n",
      "Input files located in: exp_list.txt\n",
      "Output files placed in: tagged-intermediate/us\n",
      "Decoding traffic/us/yi-camera/power/2019-04-25_19:28:58.154s.pcap\n",
      "5\t1556235010.189201000\t6.621162000\teth:ethertype:ip:udp:bootp\t322\tb0:d5:9d:b9:f0:b4\tff:ff:ff:ff:ff:ff\t0.0.0.0\t255.255.255.255\t\t\t\t68\t67\n",
      "6\t1556235010.189291000\t0.000090000\teth:ethertype:ip:icmp:data\t62\t22:ef:03:1a:97:b9\tb0:d5:9d:b9:f0:b4\t192.168.10.254\t192.168.10.204\t\t\t\t\t\n",
      "Line count: 152 tagged-intermediate/us/yi-camera/power/2019-04-25_19:28:58.154s.txt\n",
      "\n",
      "Decoding traffic/us/yi-camera/power/2019-04-25_19:25:30.155s.pcap\n",
      "5\t1556234778.510610000\t1.074978000\teth:ethertype:ip:udp:bootp\t322\tb0:d5:9d:b9:f0:b4\tff:ff:ff:ff:ff:ff\t0.0.0.0\t255.255.255.255\t\t\t\t68\t67\n",
      "6\t1556234778.510734000\t0.000124000\teth:ethertype:ip:icmp:data\t62\t22:ef:03:1a:97:b9\tb0:d5:9d:b9:f0:b4\t192.168.10.254\t192.168.10.204\t\t\t\t\t\n",
      "Line count: 160 tagged-intermediate/us/yi-camera/power/2019-04-25_19:25:30.155s.txt\n",
      "\n",
      "Decoding traffic/us/yi-camera/power/2019-04-25_19:21:40.166s.pcap\n",
      "5\t1556234554.680957000\t1.124584000\teth:ethertype:ip:udp:bootp\t322\tb0:d5:9d:b9:f0:b4\tff:ff:ff:ff:ff:ff\t0.0.0.0\t255.255.255.255\t\t\t\t68\t67\n",
      "6\t1556234554.681112000\t0.000155000\teth:ethertype:ip:icmp:data\t62\t22:ef:03:1a:97:b9\tb0:d5:9d:b9:f0:b4\t192.168.10.254\t192.168.10.204\t\t\t\t\t\n",
      "Line count: 162 tagged-intermediate/us/yi-camera/power/2019-04-25_19:21:40.166s.txt\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!./raw2intermediate.sh exp_list.txt tagged-intermediate/us"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Parse per-flow info to features per-activity\n",
    "Output has been truncated because of length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running extract_features.py...\n",
      "Input files located in: tagged-intermediate/us/\n",
      "Output files placed in: features/us/\n",
      "mkdir: created directory 'features'\n",
      "mkdir: created directory 'features/us'\n",
      "mkdir: created directory 'features/us//caches'\n",
      "Feature files to be generated from following devices: yi-camera\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/power/2019-04-25_19:25:30.155s.txt\n",
      "   160 packets ./features/us/caches/yi-camera_power_2019-04-25_19:25:30.155s.csv\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/power/2019-04-25_19:21:40.166s.txt\n",
      "   162 packets ./features/us/caches/yi-camera_power_2019-04-25_19:21:40.166s.csv\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/power/2019-04-25_19:28:58.154s.txt\n",
      "   152 packets ./features/us/caches/yi-camera_power_2019-04-25_19:28:58.154s.csv\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/android_wan_photo/2019-04-27_22:08:01.37s.txt\n",
      "   1557 packets ./features/us/caches/yi-camera_android_wan_photo_2019-04-27_22:08:01.37s.csv\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/android_wan_photo/2019-04-27_21:44:25.36s.txt\n",
      "   1573 packets ./features/us/caches/yi-camera_android_wan_photo_2019-04-27_21:44:25.36s.csv\n",
      "Extracting from ./tagged-intermediate/us/yi-camera/android_wan_photo/2019-04-27_22:29:48.37s.txt\n",
      "   1715 packets ./features/us/caches/yi-camera_android_wan_photo_2019-04-27_22:29:48.37s.csv\n"
     ]
    }
   ],
   "source": [
    "!python3 extract_features.py tagged-intermediate/us/ features/us/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Train the model(s) using the features\n",
    "Reruning the command below will skip the model training. Delete .model and .label.txt files in `tagged-models/us/` to retrain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running eval_models.py...\n",
      "Reading command line arguments...\n",
      "Performing error checking on command line arguments...\n",
      "Input files located in: features/us/\n",
      "Output files placed in: tagged-models/us/\n",
      "mkdir: created directory 'tagged-models'\n",
      "mkdir: created directory 'tagged-models/us'\n",
      "mkdir: created directory 'tagged-models/us//output'\n",
      "root_feature: features/us/\n",
      "root_model: tagged-models/us/\n",
      "root_output: tagged-models/us//output\n",
      "yi-camera.csv\n",
      "Training yi-camera using algorithm(s): ['kmeans', 'knn', 'rf']\n",
      "\t#Total data points: 2490 \n",
      "/home/derek/Documents/env3/lib/python3.6/site-packages/sklearn/preprocessing/data.py:617: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n",
      "  return self.partial_fit(X, y)\n",
      "/home/derek/Documents/env3/lib/python3.6/site-packages/sklearn/base.py:462: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n",
      "  return self.fit(X, **fit_params).transform(X)\n",
      "Train: 1743\n",
      "Test: 747\n",
      "  kmeans: n_clusters=8\n",
      "\tTime to perform tSNE: 104.03s\n",
      "\tSaved the tSNE plot to tagged-models/us//kmeans/kmeans-yi-camera.png\n",
      "    model -> tagged-models/us//kmeans/yi-camerakmeans.model\n",
      "    labels -> tagged-models/us//kmeans/yi-camera.label.txt\n",
      "\tandroid_lan_photo\n",
      "\tandroid_lan_recording\n",
      "\tandroid_lan_watch\n",
      "\tandroid_wan_photo\n",
      "\tandroid_wan_recording\n",
      "\tandroid_wan_watch\n",
      "\tlocal_move\n",
      "\tpower\n",
      "\n",
      "    _homogeneity: 0.562\n",
      "    _completeness: 0.545\n",
      "    _vmeausre: 0.553\n",
      "    _ari: 0.361\n",
      "    _silhouette: 0.268\n",
      "    _acc_score: 0.165\n",
      "    measures saved to: tagged-models/us//kmeans/yi-camera.result.csv\n",
      "  knn: n_neighbors=8\n",
      "\tTime to perform tSNE: 94.27s\n",
      "\tSaved the tSNE plot to tagged-models/us//knn/knn-yi-camera.png\n",
      "    model -> tagged-models/us//knn/yi-cameraknn.model\n",
      "    labels -> tagged-models/us//knn/yi-camera.label.txt\n",
      "\tandroid_lan_photo\n",
      "\tandroid_lan_recording\n",
      "\tandroid_lan_watch\n",
      "\tandroid_wan_photo\n",
      "\tandroid_wan_recording\n",
      "\tandroid_wan_watch\n",
      "\tlocal_move\n",
      "\tpower\n",
      "\n",
      "    _homogeneity: 0.949\n",
      "    _completeness: 0.949\n",
      "    _vmeausre: 0.949\n",
      "    _ari: 0.950\n",
      "    _silhouette: 0.108\n",
      "    _acc_score: 0.980\n",
      "    measures saved to: tagged-models/us//knn/yi-camera.result.csv\n",
      "\tTime to perform tSNE: 113.00s\n",
      "\tSaved the tSNE plot to tagged-models/us//rf/rf-yi-camera.png\n",
      "    model -> tagged-models/us//rf/yi-camerarf.model\n",
      "    labels -> tagged-models/us//rf/yi-camera.label.txt\n",
      "\tandroid_lan_photo\n",
      "\tandroid_lan_recording\n",
      "\tandroid_lan_watch\n",
      "\tandroid_wan_photo\n",
      "\tandroid_wan_recording\n",
      "\tandroid_wan_watch\n",
      "\tlocal_move\n",
      "\tpower\n",
      "\n",
      "    _acc_score: 0.985\n",
      "    measures saved to: tagged-models/us//rf/yi-camera.result.csv\n",
      "Agg saved to tagged-models/us//output/result_kmeans.txt\n",
      "Agg saved to tagged-models/us//output/result_knn.txt\n",
      "Agg saved to tagged-models/us//output/result_rf.txt\n",
      "Time to train all models for 1 devices using 12 threads: 321.81\n"
     ]
    }
   ],
   "source": [
    "!python3 eval_models.py -f features/us/ -m tagged-models/us/ -knr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Predict activities given a pcap file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running predict.py...\n",
      "Input pcap: yi_camera_sample.pcap\n",
      "Input model directory: tagged-models/us//knn\n",
      "Device name: yi-camera\n",
      "Model name: knn\n",
      "Output CSV: results.csv\n",
      "mkdir: created directory 'user-intermediates/'\n",
      "yi-cameraknn.model\n",
      "Model: tagged-models/us//knn/yi-cameraknn.model\n",
      "Total packets: 1621\n",
      "Number of slices: 2\n",
      "predict.py:204: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n",
      "  unknown_data = ss.transform(unknown_data)\n",
      "Results:\n",
      "             ts        ts_end  ts_delta  num_pkt              state\n",
      "0  1.556329e+09  1.556329e+09  0.000019     1620  android_lan_watch\n",
      "Results saved to sample.csv\n"
     ]
    }
   ],
   "source": [
    "!python3 predict.py yi_camera_sample.pcap tagged-models/us/ yi-camera rf results.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ts,ts_end,ts_delta,num_pkt,state,device\r\n",
      "1556329377.198794,1556329407.828307,1.9e-05,1620,android_lan_watch,yi-camera\r\n"
     ]
    }
   ],
   "source": [
    "!cat results.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explanation: Between epoch time 1556329377.198794 and 1556329407.828307, the network traffic from yi-camera was predicted to be the same activity as android_lan_watch, which is using the android companion app to watch the video from the camera when both devices are connected to the same WI-FI network."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
